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Learning  to  Sense  Sparse  Signals: 

Simultaneous  Sensing  Matrix  and  Sparsifying  Dictionary  Optimization 

Julio  Martin  Duarte-Carvajalino  and  Guillermo  Sapiro 

Department  of  Electrical  and  Computer  Engineering,  University  of  Minnesota 

Abstract-  Sparse  signals  representation,  analysis,  and  sensing,  has  received  a  lot  of  attention  in 
recent  years  from  the  signal  processing,  optimization,  and  learning  communities.  On  one  hand,  the 
learning  of  overcomplete  dictionaries  that  facilitate  a  sparse  representation  of  the  image  as  a  liner 
combination  of  a  few  atoms  from  such  dictionary,  leads  to  state-of-the-art  results  in  image  and 
video  restoration  and  image  classification.  On  the  other  hand,  the  framework  of  compressed 
sensing  (CS)  has  shown  that  sparse  signals  can  be  recovered  from  far  less  samples  than  those 
required  by  the  classical  Shannon-Nyquist  Theorem.  The  goal  of  this  paper  is  to  present  a 
framework  that  unifies  the  learning  of  overcomplete  dictionaries  for  sparse  image  representation 
with  the  concepts  of  signal  recovery  from  very  few  samples  put  forward  by  the  CS  theory.  The 
samples  used  in  CS  correspond  to  linear  projections  defined  by  a  sampling  projection  matrix.  It  has 
been  shown  that,  for  example,  a  non-adaptive  random  sampling  matrix  satisfies  the  fundamental 
theoretical  requirements  of  CS,  enjoying  the  additional  benefit  of  universality.  On  the  other  hand,  a 
projection  sensing  matrix  that  is  optimally  designed  for  a  certain  signal  class  can  further  improve 
the  reconstruction  accuracy  or  further  reduce  the  necessary  number  of  samples.  In  this  work  we 
introduce  a  framework  for  the  joint  design  and  optimization,  from  a  set  of  training  images,  of  the 
overcomplete  non-parametric  dictionary  and  the  sensing  matrix.  We  show  that  this  joint 
optimization  outperforms  both  the  use  of  random  sensing  matrices  and  those  matrices  that  are 
optimized  independently  of  the  learning  of  the  dictionary.  The  presentation  of  the  framework  and 
its  efficient  numerical  optimization  is  complemented  with  numerous  examples  on  classical  image 
datasets. 

Index  Terms —  Compressed  Sensing,  Image  Patches,  Overcomplete  Dictionary,  Sensing 
Projection  Matrix,  Sparse  Representation,  Learning. 


I.  INTRODUCTION 

Image  compression  algorithms  have  been  successfully  employed  in  the  past  to  transform  a  high 
resolution  image  into  a  relatively  small  set  of  (quantized)  coefficients  that  efficiently  represent  the  image 
on  an  appropriate,  often  orthonormal,  basis  such  as  DCT  or  wavelets.  This  representation  is  designed  to 
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preserve  the  essential  eontent  of  the  image  while  at  the  same  time  redueing  eosts  in  storage,  proeessing, 
and  transmission.  Sinee  natural  images  ean  be  eompressed  on  an  appropriate  basis,  sampling  the  seene 
into  millions  of  pixels  to  obtain  high  resolution  images  that  are  then  to  be  eompressed  before  proeessing, 
seems  often  to  be  wasteful  [1]-[1 1].  The  main  reason  why  signals  in  general  and  images  in  partieular  have 
been  traditionally  sensed  using  a  large  number  of  samples  is  the  Shannon-Nyquist  Theorem:  the  sampling 
rate  must  be  at  least  twiee  the  bandwidth  of  the  signal.  Images  are  not  naturally  band  limited,  however, 
aequisition  systems  use  anti-aliasing  low  pass  filters  before  sampling,  henee,  Shannon-Nyquist  Theorem 
plays  an  implieit  role  in  images  and  signals  in  general. 

Compressive  sensing  (CS)  is  an  emerging  framework  stating  that  sparse  signals,  i.e.,  signals  that  have  a 
eoneise  (sparse)  representation  on  an  appropriate  basis,  ean  be  exaetly  reeovered  from  a  number  of  linear 
projeetions  of  dimension  eonsiderably  lower  than  the  number  of  samples  required  by  the  Shannon- 
Nyquist  Theorem  (in  the  order  of  2-3  times  the  sparsity  of  the  signal,  regardless  of  the  aetual  signal 
bandwidth)  [l]-[5],[7].  In  addition,  signals  that  are  well  approximated  by  sparse  representations  (i.e., 
compressible),  sueh  as  natural  images  [12]-[18],  ean  be  also  sensed  by  linear  measurements  at  a  mueh 
lower  rate  than  double  their  aetual  bandwidth,  as  required  by  the  Shannon-Nyquist  Theorem,  with 
minimum  loss  of  information  [l]-[3].  This  means  that  instead  of  sensing  an  image  using  millions  of  pixels 
to  obtain  high  resolution,  the  image  ean  be  sensed  direetly  in  eompressed  form,  by  sampling  a  relatively 
small  number  of  projeetions  that  depends  on  the  aetual  sparsity  (and  not  bandwidth)  of  the  image  [1]- 
[11].^ 

Compressive  sensing  relies  on  two  fundamental  prineiples,  e.g.,  see  the  reeent  review  [4]: 

•  Sparsity:  Let  x  E  be  an  A^-pixels  image  and  V  =  [i|ii  ...  an  orthonormal  basis  (dietionary), ^ 

^  There  might  be  many  reasons  for  still  sampling  following  the  traditional  Shannon-Nyquist  requirements,  even 
for  very  sparse  signals,  e.g.,  the  simplicity  of  the  reconstmction  of  band  limited  signals  from  their  samples  as  well  as 
the  existence  of  very  efficient  sampling  and  reconstmction  hardware  and  software.  We  consider  CS  and  the  concepts 
of  it  exploited  in  this  paper  as  an  alternative  and  addition  to  the  classical  Shannon-Nyquist  framework,  viable  and 
very  useful  for  many  signal  and  image  processing  scenarios,  but  not  at  all  as  a  replacement  for  it. 

^  While  following  the  basic  theory  of  CS,  we  consider  for  the  moment  an  orthonormal  basis,  the  concept  of 
sparsity  is  much  more  general,  and  best  applied  in  image  processing  when  including  overcomplete  dictionaries,  as 
exploited  in  this  paper. 


IEEE  TRANSACTIONS  ON  IMAGE  PROCESSING 


3 


with  elements  also  in  sueh  that 

(1) 

where  0  =  [9i  ...  9^]^  is  the  veetor  of  eoeffieients  that  represents  x  on  the  basis  V.  A  signal  or 

image  is  said  to  be  sparse  if  most  of  the  eoeffieients  of  0  are  zero  or  they  ean  be  disearded  without 
mueh  loss  of  ‘‘information.”  Let  be  the  image  where  only  the  S  largest  eoeffieients  of  0  are  kept  and 
the  rest  are  set  to  zero  (obtaining  05),  i.e.,  X5  =  V05.  If  the  eoeffieients,  sorted  in  deereasing  order  of 
magnitude,  deerease  quiekly,  then  x  is  very  well  approximated  by  X5,  when  properly  seleeting  both  S 
and  the  basis/dietionary  V.  Sueh  signal  is  said  to  be  (approximately)  ^S-sparse.  Natural  images  are 
known  to  be  sparse,  with  S  signifieantly  lower  than  the  aetual  image  dimension,  when  represented  on  an 
appropriated  basis  sueh  as  wavelets,  sinusoids,  or  a  learned  (overeomplete)  dietionary.  Sparse 
representations  form  the  basis  of  many  sueeessful  image  proeessing  and  analysis  algorithms,  from 
JPEG  and  JPEG2000  eompression  [19]-[21],  to  image  enhaneement  and  elassifieation,  e.g.,  [12],  [22]- 
[25]. 

•  Incoherent  Sampling:  Let  4>  =  [(|>i  ...  be  an  mxN  sampling  matrix,  with  m  «  A,  sueh  that 

y  =  Ox  is  an  mxl  veetor  of  linear  measurements  (meaning  we  no  longer  observe  the  image  x  but  an 
undereomplete  linear  projeetion  of  it).  Compressive  Sensing  theory  requires  that  O,  the  sensing  matrix, 
and  O,  the  sparse  representation  matrix,  be  as  ineoherent  (orthogonal)  as  possible.  A  measure  of 
eoherenee  between  O  and  O  is  given  by 

•  rnciX'i<i<fYi^  (2) 

l<j<N 

/i(0,0)  E  [1,  Va]  measures  the  maximal  eorrelation  between  both  matrix  elements  (see  also  [26],  [27] 
for  the  related  definition  of  mutual  coherence  of  a  dietionary,  whieh  plays  an  important  role  in  the 
sueeess  of  basis  pursuit  and  the  greedy  sparsifying  orthogonal  matching  pursuit  algorithm  as  well  [26]- 
[31];  see  below). 

CS  deals  with  the  ease  of  low  eoherenee  between  the  sensing  and  sparsifying  matriees.  Intuitively,  one 
ean  see  that  4>  and  V  must  be  uneorrelated  (ineoherent),  sueh  that  the  samples  add  new  information  that 
is  not  already  represented  by  the  known  basis  V.  It  turns  out  that  random  matriees,  e.g.,  Gaussians  or  ±1 
random  matriees,  are  largely  ineoherent  with  any  fixed  sparsifying  basis  V  with  overwhelming 
probability.  This  has  lead  CS,  at  least  at  the  theoretieal  level  as  well  as  early  applieations  in  image 
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processing  [10],  to  strongly  rely  on  random  sensing  matrices,  since  they  provide  universally  incoherent 
sensing-sparsifying  pairs  and  are  well  conditioned  for  reconstruction. 

Compressed  sensing  combines  both  concepts  of  sparsity  and  incoherence  between  the  sensing  and 
sparsifying  matrix  by  reconstructing  the  sparsest  possible  signal  that  agrees  with  the  undercomplete 
(m  «  N)  set  of  measurements.  Let  y  be  the  vector  of  m  linear  measurements  of  the  sparse  signal  x  using 
the  sampling/sensing  matrix  4>.  The  retrieval  of  x  from  y  can  be  done  by  £Q-'''noYm''  minimization  (note 
how  both  the  sensing  and  the  sparsifying  matrix  appear  in  the  formulation), 

0  :=  minellBII^Q  subject  to  y  =  4>V0,  x  =  V0,  (3) 

where  the  £Q-‘nonn''  simply  counts  the  number  of  non-zeros  in  0.^  Compressive  Sensing  theory  ensures 
(in  one  of  the  many  recent  fundamental  results),  that  if  the  number  of  measurements  m  satisfies 

m  >  C  •  /i2(0,  V)  •  5  •  logiV,  (4) 

for  some  positive  constant  C,  then,  with  overwhelming  probability,  the  reconstruction  is  exact  (even 
actually  using  for  sparse  promotion  instead  of  making  the  problem  (3)  convex). 

As  mentioned  above,  related  to  /i(4>,V)  is  the  notion  of  mutual  coherence  /i(V)  of  the  sparsifying 
dictionary  (or  the  equivalent  dictionary  D  :=  OV),  which  is  the  largest  absolute  normalized  inner  product 
between  the  atoms  of  the  dictionary  (see  Equation  (7)  in  the  next  section  for  the  exact  definition).  If  the 
following  inequality  holds  [26],  [27],  [30], 

then,  0  is  necessarily  the  sparsest  solution  (miriQWQW^^)  such  that  x  =  V0  (or  y  =  4>V0  when 
considering  sensing  and  the  equivalent  dictionary  D),  and  Orthogonal  Matching  Pursuit  (OMP),  a  fast 
greedy  algorithm  used  to  solve  sparse  representation  problems,  is  guaranteed  to  succeed  in  finding  the 
correct  0  (same  for  the  basis  pursuit,  which  again  replaces  the  by  see  e.g.,  [26],  [27],  [29]-[32]. 
Note  of  course  that  this  property  on  the  dictionary,  and  in  contrast  with  (5),  does  not  explicitly  consider 
the  number  of  samples  m. 

Since,  in  general,  signals  of  interest  are  not  exactly  sparse,  but  nearly  sparse,  and  also  contain  noise  added 
by  the  measurement  system,  it  is  imperative  for  CS  to  be  robust  under  such  non-idealities  [4].  A  key 


^  This  is  not  an  actual  norm,  but  it  is  often  referred  to  as  such. 

The  use  of  the  f  ^  norm  brings  interesting  connections  with  robust  statistics  and  regression. 


IEEE  TRANSACTIONS  ON  IMAGE  PROCESSING 


5 


notion  in  CS  theory  that  comes  to  the  rescue  in  this  scenario  is  the  Restricted  Isometry  Property  (RIP)  [1]- 
[4]  (and  references  therein,  see  also  [34]  for  some  results  for  overcomplete  dictionaries).  The  iS-restricted 
isometry  constant  is  the  smallest  0  <  65  <  1  such  that 

VT<S:  (1-5s)I|0tII',  <  110x0x1113  ^  O  +  <^5)110x1113, 
where  Dj  is  a  subset  of  T  columns  extracted  from  the  equivalent  dictionary  D  :=  4>V,  and  Qj  are  the 
(sparse)  coefficients  corresponding  to  the  T  selected  columns.  In  words,  for  proper  values  of  S^,  the  RIP 
ensures  that  any  subset,  with  cardinality  less  than  S,  of  columns  of  the  equivalent  dictionary  4>V  are 
nearly  orthogonal  (the  columns  cannot  be  exactly  orthogonal  since  we  have  more  columns  that  rows),  i.e., 
incoherence  between  4>  and  V  is  ensured.  If  the  RIP  holds  (see  the  above  mentioned  references  for  the 
exact  needed  values  of  the  isometry  constant),  greedy  algorithms  such  as  regularized  OMP,  [35],  and 
convex  optimization  are  guaranteed  to  succeed  (the  results  for  OMP  are  weaker).  This  holds  as  well, 
again  with  the  possibility  to  optimize  with  the  norm  instead  of  in  the  presence  of  noise  and  for 
signals  with  non-exact  sparsity  [l]-[8].  Hence,  Equation  (3)  becomes  in  practice, 

"1^0110114  subject  to  ||y-<D'I'e||  <  e,  (6) 

■^2 

where  6  >  0  takes  into  account  the  possibility  of  noise  in  the  linear  measurements  and  of  non-exact 
sparsity. 

While  the  sampling  projection  matrix  4>  should,  in  theory,  be  independent  of  the  signal,  the  sparsifying 
basis  ^  should  adapt  as  much  as  possible  to  the  image  at  hand,  e.g.,  to  make  the  representation  as  sparse 
as  possible.  A  key  result  in  image  processing  is  that  images  can  be  coded  and  sparsely  represented  more 
efficiently  using  (often  learned)  overcomplete  dictionaries  rather  than  fixed  bases,  e.g.,  [18],  [33],  [36]. 
Let  Xp  E  n  «  A,  be  a  patch,  i.e.,  a  square  portion  of  the  image  x  of  size  B  x  B  =  n  pixels.  An 
overcomplete  dictionary  is  an  n  X  /f  matrix  W  =  [i|ii  ...  n  <  K,  such  that  Xp  =  V0p,  and  for  the 

AT-dimensional  vector  of  coefficients  0p,  we  have  ||0p||^  «n.  Such  dictionaries  are  learned  from 

patches  extracted  from  large  datasets  of  images,  and  thereby  are  optimized  for  the  data  (they  can  also  be 
optimized  to  a  particular  image  class,  e.g.,  [16],  [22]).  From  the  learned  non-parametric  overcomplete 
dictionaries,  state-of-the-art  results  in  image  denoising,  inpainting,  and  demosaicing,  have  been  obtained 
in  the  past  [12].  This  learning  framework  is  also  exploited  in  this  work. 

Overcomplete  learned  dictionaries  are  not  orthonormal  bases  and  hence,  the  full  extent  of  the  CS  theory 
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does  not  entirely  apply  here  (see  for  example  [7],  [34]  for  some  results).  Reeently,  Elad  [37]  (see  also 
[38],  [39]),  showed  experimentally  that  a  well-designed  sampling  matrix  ean  signifieantly  improve  the 
performanee  of  CS  when  eompared  to  random  matriees,  in  terms  of  improving  the  ineoherenee  for  a 
given  dietionary  and  the  reeonstruetion  aeeuraey  from  the  eorresponding  linear  samples.  This  means  that 
for  a  speeifie  pre-defined  or  pre-learned  overeomplete  dietionary,  a  sampling  projeetion  matrix 
speeifieally  designed  for  the  dietionary  ean  indeed  improve  CS  over  a  generie  random  sampling  matrix. 
Note  that  the  sampling  is  not  adaptive,  is  just  optimized  for  the  signal  elass.  Theoretieal  studies,  with 
praetieal  implieations,  regarding  the  eonstruetion  of  deterministie  sampling  matriees  and  their  RIP,  are 
starting  to  appear,  e.g.,  see  [40],  [41]  and  referenees  therein. 

In  this  paper,  we  introduee  a  framework  for  simultaneously  learning  the  overeomplete  non-parametrie 
dietionary  V  and  the  sensing  matrix  4>.  That  is,  in  eontrast  with  Elad’s  work  and  those  briefly  diseussed 
in  the  next  seetion,  we  do  not  eonsider  a  pre-learned  or  pre-defined  dietionary;  we  learn  it  together  with 
the  sensing  matrix.  In  eontrast  with  the  more  standard  CS  framework,  we  do  not  assume  the  sparsifying 
basis  (or  dietionary)  is  given  and  eonsider  universal  sampling  strategies,  but  simultaneously  optimize  for 
both  these  eritieal  eomponents  exploiting  image  datasets.  Contrasting  also  with  earlier  work  on  the 
learning  of  overeomplete  non-parametrie  dietionaries,  we  eonsider  linear  projeetions  of  the  image  as  the 
available  measurements  for  reeonstruetion,  and  not  the  image  itself  (or  a  noisy  version  of  it).  We 
experimentally  show  that  the  proposed  framework  of  simultaneous  optimization  of  the  sensing  matrix  and 
sparsifying  dietionary  leads  to  improvements  in  the  image  reeonstruetion  results.  We  also  show  that  the 
learned  sensing  matriees  have  larger  ineoherenee,  as  requested  by  the  RIP,  for  a  given  dietionary,  than 
random  matriees  and  the  ones  obtained  with  the  algorithm  proposed  by  Elad,  on  top  of  leading  to  lower 
image  reeonstruetion  errors.  Computational  improvements,  when  eompared  with  [37],  are  obtained  as 
well  with  our  proposed  framework  when  eonsidering  a  given  dietionary,  and  these  form  the  basis  of  our 
proposed  simultaneous  optimization  framework. 

The  remainder  of  this  paper  is  organized  as  follows.  In  Seetion  II,  we  review  Elad’s  approaeh  [37]  (as 
well  as  briefly  [38]  and  other  reeent  related  works),  and  introduee  our  proposed  new  algorithm  to  learn 
deterministie  sensing  matriees  when  the  sparsifying  dietionary  is  given.  We  show  that  the  algorithm  is 
signifieantly  faster  than  the  one  in  [37]  and  leads  to  improved  performanee  in  terms  of  sensing-dietionary 
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incoherence  and  accuracy  of  the  reconstructed  images.  In  Section  III,  we  review  the  KSVD  algorithm  for 
learning  overcomplete  non-parametric  dictionaries  from  image  datasets,  [12],  [18],  [42],  and  introduce  the 
novel  coupled-KSVD  as  a  necessary  modification  to  include  the  simultaneous  learning  of  both  the 
dictionary  and  the  corresponding  sampling  projection  matrix.  In  Section  IV,  we  present  detailed 
experimental  results  indicating  the  superiority  of  our  new  framework  to  construct  deterministic  sensing 
matrices,  and  the  results  of  simultaneously  learning  generic  sensing  matrices  and  overcomplete 
dictionaries  using  datasets  of  image  patches.  Finally,  concluding  remarks  and  directions  for  future 
research  are  presented  in  Section  V. 

While  the  framework  here  introduced  is  applicable  to  signals  in  general,  from  now  on,  we  consider  only 
natural  images,  and  the  sparsifying  basis  will  always  be  overcomplete  dictionaries  learned  from  the 
images. 


II.  Optimized  Projections  For  Compressive  Sampeing 

In  this  section  we  show  how  to  optimize  the  sensing  matrix  given  a  sparsifying  dictionary.  We  start  by 
reviewing  prior  related  work,  followed  by  the  presentation  of  our  proposed  algorithm  and  the  first  results 
showing  the  computational  and  reconstruction  advantages  of  this  new  approach. 

A.  Previous  Related  Work 

To  the  best  of  our  knowledge,  only  very  few  recent  publications,  e.g.,  [37]-[41],  explicitly  address  the 
idea  that  non-random  matrices  are  important  and  could  be  more  effective  than  random  projection  matrices 
for  sensing  sparse  signals.  In  particular,  [39]  shows  that  chirp-based  sampling  matrices  can  be  used 
instead  of  random  sampling  matrices,  retaining  the  same  reconstruction  accuracy,  but  with  the  advantage 
that  the  retrieval  of  the  original  signal  becomes  computationally  much  cheaper  than  using,  for  example, 
orthogonal  matching  pursuit  (OMP).  On  the  other  hand,  [37],  [38]  address  a  different  idea:  random 
sampling  matrices  are  not  necessarily  optimal,  in  the  reconstruction  error  sense,  to  sample  specific  classes 
of  sparse  signals,  and  in  particular,  natural  images. 

Considering  well-known  characteristics  and  models  for  the  second  order  statistics  of  natural  images,  see 
e.g.,  [43]-[46],  Weiss  et  al.,  [38],  first  showed  that  the  Signal  to  Noise  Ratio  (SNR)  of  images  projected 
using  (almost  any)  random  sampling  matrices  goes  to  zero  as  the  number  of  pixels  increase,  while  it 
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remains  constant  for  their  proposed  deterministic  sampling  matrices.  They  then  introduced  the  concept  of 
Uncertain  Component  Analysis  (UCA),  that  leads  to  maximize  the  posterior  probability  of  the  data  x,  for 
a  given  projection  matrix  4>  and  training  projection  data  y.  Let  ... ,  Xp  be  a  set  of  P  training  patches  and 
y^,  ...,yp  their  respective  projections,  y^  =  Ox^,  \<i<P.  Then,  the  optimal  projection  matrix,  maximizing 
the  probability  of  retrieving  the  original  patches,  is  given  by  O  :=  argmax^,  ||4>||=i  Hi  PCXily^;  O).  The 


prior  probability  is  assumed  to  be  i.i.d.  Gaussian,  P(yj|x;4>)  = 


1 

y/2no^ 


e 


2cj2  .  Experimentally,  the 


authors  found  that  UCA  works  only  slightly  better  than  random  projection  on  synthetic  signals  of  fixed 
sparsity,  although  it  works  significantly  better  than  random  projections  on  patches  of  natural  images. 

Elad  [37]  introduced  a  new  algorithm  that  does  not  make  any  assumptions  on  the  statistics  of  the  data  set, 
but  attempts  to  improve  directly  the  incoherence  between  the  m  X  n  sampling  matrix  4>  (m  samples  of  the 
n  dimensional  signal),^  and  the  n  X  /f  sparsifying  dictionary  V,  as  required  by  the  CS  theory,  assuming  a 
given  overcomplete  dictionary  V  has  been  already  provided  (note  that  [38]  is  not  explicitly  based  on  a 
dictionary,  which  in  our  proposed  work  below  we  directly  learn  from  the  data).  Experimentally,  Elad 
showed,  using  synthetic  random  signals  of  fixed  and  exact  sparsity,  that  a  well-designed  projection  matrix 
that  depends  on  the  sparsifying  basis  V,  can  reduce  the  mutual  coherence  of  the  equivalent  dictionary 
D  =  and  hence,  reduce  the  reconstruction  error  from  the  projections. 

Our  work  follows  in  part  Elad’s  idea,  since  we  do  not  assume  any  prior  knowledge  or  statistics  on  the  data 
set,  while  in  contrast  with  [37],  we  do  not  assume  a  given  sparsifying  dictionary.  In  fact,  both  the 
overcomplete  dictionary,  V,  and  projection  matrix,  4>,  are  learned  from  the  dataset  (see  next  section  for 
the  general  case  of  simultaneous  learning).  Both  [37]  and  [38]  provide  a  way  of  computing  4>  from  data 
or  a  dictionary,  although  do  not  provide  any  hint  on  how  4>  might  help  the  actual  learning  of  the 
sparsifying  basis  V,  and  vice  versa. 

Our  work  starts  with  presenting  an  alternative  to  the  framework  in  [37]  for  learning  the  projecting  matrix 
from  the  previously  learned  dictionary.  This  alternative,  as  we  show  here,  is  computationally  more 
efficient  and  produces  significantly  better  results.  We  therefore  start  by  briefly  describing  the  algorithm  in 
[37].  Motivated  by  the  incoherence  required  by  CS,  as  well  as  the  fundamental  conditions  for  optimal 


^  We  use  n  for  the  signal  dimension  since  we  will  work  with  patches. 
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performance  of  orthogonal  matching  pursuit  and  basis  pursuit  (see  previous  section),  Elad  proposed  to 
reduce  the  mutual  coherence,  /i(D),  of  the  equivalent  dictionary  D  :=  D  =  [d^  ...  d^],  defined 

as  [37] 


M(D)  :  =  TTiCiXi^ 


d/d 


I 


(7) 


Hence,  if  the  projection  matrix  <I»  is  designed  such  that  is  minimal  (recall  that 

is  fix  for  the  moment),  a  larger  number  of  signals  would  satisfy  (5)  and  be  successfully  recovered 
from  their  linear  projections. 

Instead  of  minimizing  (7),  Elad  proposes  to  work  with 


Mt(D) 


S  1  <  i  j  < /C  (1 i;j  ^ 


where  gij  =  djdj,  dj  =  dj/IIdjH^^?  Hs  a  scalar  that  establishes  the  minimum  value  of  jUt(I>).  From 


Equation  (8),  then  is  obvious  that  jUt>  t.  Hence,  t  is  the  target  value  Elad  proposes  to  minimize. 

An  alternative  way  of  looking  at  the  mutual  coherence  of  the  equivalent  dictionary  D  is  to  consider  the 
Gramm  matrix,  G  :=  D^D,  where  D  =  [d^  ...  d^]  is  the  equivalent  dictionary  D  with  all  its  columns 

normalized.  Elad’s  idea  is  to  minimize  the  largest  absolute  values  of  the  off-diagonals  in  the 
corresponding  Gramm  matrix,  while  keeping  the  rank  of  the  equivalent  dictionary  equal  to  m«n.  This  in 
turn  minimizes  jUf.  Instead  of  targeting  jUt(D),  we  address  the  problem  of  making  any  subset  of  columns  in 
D  as  orthogonal  as  possible,  or  equivalently,  G  should  be  as  close  as  possible  to  the  identity  matrix.  We 
then  directly  target  what  we  have  learned  from  the  RIP,  which  guarantees  robustness  of  CS  to  noise  and 
non-exact  sparsity.  As  we  show  below,  our  proposed  approach  outperforms  the  one  introduced  in  [37], 
especially  for  real  not-exactly  sparse  signals  (images).  Computationally,  the  proposed  algorithm  is 
significantly  more  efficient  than  the  one  in  [37].  Moreover,  after  introducing  the  proposed  algorithm  to 
achieve  this  close  to  the  identity  Gramm  matrix,  we  also  introduce  the  novel  idea  of  simultaneously 
designing  4>  and  V,  which,  to  the  best  of  our  knowledge,  has  not  been  addressed  before. 


B.  Learning  the  Sensing  Projection  Matrix 

As  mentioned  above,  and  considering  for  the  moment  that  the  dictionary  V  is  known,  we  want  to  find  the 
sensing  matrix  4>  such  that  the  corresponding  Gramm  matrix  is  as  close  to  the  identity  as  possible,  i.e., 

^Kn<m^^mn^nK  I^. 
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Let  us  multiply  both  sides  of  the  previous  expression  by  V  on  the  left  and  on  the  right,  obtaining 

_  IJJipT 

Let  VAV^  be  the  (known)  eigen-deeomposition  of  then  VAV^O^OVAV^  VAV^,  whieh  is 

equivalent  to  AV^O^OVA  A.  Let  us  denote  :=  ^mn^nn^  henee,  AT^TA  A.  We  want  to 
eompute  0(r)  in  order  to  minimize 

||A  -  ArTrAll^.  (9) 

Note  that  if  V  is  an  orthonormal  basis  and  m  =  n  (non  standard  in  CS),  then  the  previous  equation  would 

1 

have  an  exaet  solution  that  produees  zero  error,  i.e.,  T  =  A  2.  However,  sinee  the  dietionary  is 
overeomplete,  a  eritieal  aspeet  for  aehieving  high  sparsity  and  state-of-the-art  image  reeonstruetion  [12], 
[18],  [42],  and  in  partieular  m  «n  (and  then  m  «  ran/c(A)),  we  have  to  find  an  approximated  solution 
for  minimizing  the  error  in  Equation  (9).  We  will  aehieve  this  starting  from  a  random  sensing  matrix  4> 
(and  its  eorresponding  T),  and  progressively  improving  it  in  order  to  reduee  this  error.^  This  is  detailed 
next. 


Let  be  the  eigenvalues  of  the  known  diagonal  matrix  A,  ordered  in  deereasing  order  of 

magnitude,  and  =  [^1  ■■■  Then,  Equation  (9)  beeomes  ||A  —  where  = 

[AiTj  i  ...  or  equivalently. 


Uj 


T  T 

ViV/  -  VjVj 


(10) 


Let  us  define  E  ■— A  —  Ey  —  A  —  and  let  Ey  =  UyAyljJ  be  the  eigen- 

2  2 

deeomposition  of  Ey.  Then,  Equation  (10)  beeomes  ||Ey— VyvJH^  = 


^  Alternatively,  we  can  obtain  a  closed  solution  to  (9)  of  the  form  f  :=  [H,  r2]?  where  L  is  a  diagonal  matrix 
obtained  from  the  top  m  eigenvectors  of  A  (elevated  to  the  -1/2  power).  While  this  provides  a  slightly  faster 
algorithm  than  the  m-steps  here  proposed,  it  produces  virtually  the  same  reconstmction  results.  Our  proposed 
approach  follows  the  algorithmic  concepts  of  the  KSVD  and  couple-KSVD  described  in  the  next  section  for 
dictionary  learning,  and  shows  how  to  progressively  improve  the  classical  random  matrix  of  CS,  also  providing  one 
possible  solution  to  addressing  the  possible  ambiguity  provided  by  this  closed  form  alternative  f.  (We  thank  Donald 
Goldfarb  and  Shiqian  Ma  for  pointing  out  this  closed  solution  and  additional  comments  on  the  minimization  of  (9).) 


IEEE  TRANSACTIONS  ON  IMAGE  PROCESSING 


11 


set  Vj  =  y,  j  being  the  largest  eigenvalue  of  Ey  and  y  its  eorresponding  eigenveetor,  then  the 

largest  error  eomponent  in  E  is  eliminated.  Replaeing  Vy  baek  in  terms  of  Ty  (the  rows  of  the  matrix  we 
are  optimizing  for,  T  =  4>V), 

■■■  (11) 

Sinee  the  matrix  A  is  in  general  not  full-rank,  then  for  some  r  >  0,  will  be  zero,  and  we  ean 

only  update  the  Ty  ...,Ty^^_^  eomponents  of  Ty.  This  derivation  forms  the  basis  of  our  algorithm  for 
optimizing  (9),  see  below  for  the  exaet  steps. 

Onee  we  obtain  f  ,  then  4>  ean  be  easily  eomputed  following  the  relationship  f  =  4>V,  as  4>  =  fV^. 
Also,  sinee  we  ean  only  reduee  m  «n  eomponents  of  the  error  matrix  E,  and  the  error  has  a  rank  lower 
but  elose  to  n  (reeall  that  V  is  almost  an  orthonormal  basis  for  M^),  then  there  is  no  aetual  hope  to 
eompletely  eliminate  the  error  in  (9).  The  proposed  teehnique  aims  at  redueing  the  largest  m  eomponents 
of  this  error  matrix  E. 

In  summary,  the  following  are  the  steps  of  the  proposed  algorithm  for  optimizing  the  sensing  matrix  given 
the  dietionary  (using  the  notation  defined  above): 

1 .  Initialize  4>,  for  example,  to  a  random  matrix. 

2.  Find  the  eigen-deeomposition  =  VAV^  and  r,  the  number  of  non-zero  eigenvalues  of 

3.  Initialize  f  :=  OV. 

4.  For 7=1  tom 

Compute  Ey. 

Find  the  largest  eigenvalue  and  eorresponding  eigenveetor  of  Ey,  y  and  y. 

Use  (1 1)  to  update  the  first  r  eomponents  of  Ty  (thereby  updating  f ). 

5.  Compute  the  optimal  4>  =  fV^. 

C.  Some  Preliminary  Experimental  Results 

As  we  will  show  in  the  experimental  results,  Seetion  IV,  this  parameter-free  algorithm  is  not  only 
eonsiderably  faster  than  the  one  introdueed  in  [37],  but  also  signifieantly  improves  the  reeonstruetion 
results  and  provides  a  fundamental  building  bloek  for  the  simultaneous  optimization  of  the  dietionary  and 
sensing  matrix.  Before  these  more  detailed  experimental  results,  let  us  now  present  some  illustrative 
results  showing  the  advantages  of  the  proposed  methodology  over  the  algorithm  proposed  by  [37].  For 
this,  we  will  also  eonsider  below  the  (average)  mutual  eoherenee. 
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_  (12) 

K(^K-iy 

This  is  simply  the  mean  square  error  that  aeeounts  for  the  off-diagonal  elements  in  the  Gramm  matrix, 
while  (8)  only  aeeounts  for  the  maximum  off-diagonal  value. 

Figure  1  eompares,  for  three  different  sensing  matriees  4>,  the  distribution  of  the  absolute  value  of  the  off- 
diagonal  elements  of  the  Gramm  matrix  obtained  using  a  dietionary  V  learned  from  440  natural  images 
(see  Seetion  IV  for  more  details  on  this  dietionary).  The  three  sensing  matriees  are  a  Gaussian  random 
sampling  matrix  (as  eommonly  used  in  CS),  the  sensing  matrix  obtained  using  Elad’s  algorithm  (with 
parameters  optimized  to  reduee  /it(D);  y  =  0.6,t  =  20%),  and  our  new  proposed  algorithm. 
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Figure  1:  Distribution  of  the  off-diagonal  elements  of  the  Gramm  matrix  obtained  using  a  random  sampling 
matrix,  Elad^s  algorithm,  and  the  new  proposed  algorithm. 

Both  our  teehnique  and  the  one  proposed  in  [37]  try  to  reduee  the  largest  off-diagonal  elements  in  the 
Gramm  matrix,  however,  Elad’s  algorithm  always  presents  a  eonsistent  artifaet  (see  also  [37]),  where 
some  off-diagonal  elements  in  the  Gramm  matrix  aetually  inerease  their  value  (notiee  the  peak  between 
0.2  and  0.4  in  Figure  1),  whieh  does  not  affeet  /if  but  it  does  affeet  negatively  the  RIP  (see  previous 
Seetion).  Our  proposed  algorithm  inereases  the  frequeney  of  the  off-diagonal  elements  with  low  absolute 
value  (between  0  and  0.2,  Figure  1),  and  reduees  the  frequeney  of  large  absolute  values  in  the  Gramm 
matrix,  better  enforeing  thus  the  RIP,  sinee  it  tries  to  make  the  eolumns  of  D  as  elose  to  orthogonal  as 
possible. 

This  better  behavior  of  the  Gramm  matrix  (reeall  the  RIP)  is  refleeted  in  inereased  aeeuraey  in  the  image 
reeonstruetion  as  well.  Table  1  eompares  Elad’s  and  our  new  proposed  algorithm  to  design  projeetion 
matriees,  using  patehes/signals  eorresponding  to  synthetie  data  of  fixed  sparsity  (as  in  [37]),  patehes 
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coming  from  real  images  pre -projected  to  have  a  fixed  sparsity,  and  patches  coming  from  real  images 
without  restricting  their  sparsity  (compressible  patches  in  contrast  to  sparse  ones).  The  algorithm  in  [37] 
enforces  a  measure  of  incoherence,  but  not  directly  the  RIP  and  its  intuition,  and  does  not  actually 
outperform  random  sampling,  in  the  mean  square  error  (MSE)  sense,  when  the  patches  do  not  have  an 
exact  fixed  sparsity  (see  Table  1).  The  first  row  in  Table  1  corresponds  to  10000  synthetic  signals 
obtained  by  combining  at  random  S=4  columns  of  a  randomly  generated  64x256  dictionary.  The  second 
row  corresponds  to  6600  patches  projected  to  have  a  fixed  sparsity,  S=6,  obtained  using  OMP  to 
reconstruct  the  real  image  patches;  the  dictionary  V  of  size  64x256  was  trained  using  these  patches  and 
the  KSVD  algorithm  (see  next  section).  The  last  row  corresponds  to  6600  patches  selected  at  random 
from  440  real  images  and  a  64x256  dictionary  V  trained  with  them  using  KSVD.  The  table  reports 
average  and  variance  results. 


Table  1:  Comparison  of  Elad^s  vs  the  new  proposed  algorithm. 


B 

□ 

Patches 

Projection 

time(s) 

Ft 

Mg 

MSE 

^MSE 

Random 

- 

0.482 

0.005 

0.002 

2.957 

0.071 

4 

16 

Synthetic 

Elad'algorithm 

794.53 

0.434 

0.009 

0.002 

1.694 

0.071 

New  Algorithm 

0.23 

0.421 

2.87E-04 

0.059 

1.37E-05 

1.549 

0.024 

6 

15 

Image,  fixed 
sparsity 

Random 

Elad'algorithm 

341.55 

0.582 

0.515 

0.0139 

0.0146 

0.115 

0.095 

0.0062 

0.0052 

0.152 

0.076 

0.029 

0.006 

New  Algorithm 

0.22 

0.456 

6.91E-04 

0.063 

5.09E-05 

0.022 

0.001 

Random 

- 

0.535 

0.010 

0.097 

0.004 

0.331 

0.028 

6 

15 

Real  images 

Elad'algorithm 

426.85 

0.487 

0.013 

0.085 

0.004 

0.357 

0.022 

New  Algorithm 

0.80 

0.442 

4.22E-04 

0.064 

4.69E-05 

0.002 

From  this  table  we  note  that  /it(D),  computed  using  (8),  is  similar  in  Elad’s  and  our  algorithm,  but  /i^(D) 
is  lower  for  the  proposed  algorithm.  The  reason  is  that  (8)  only  takes  into  account  the  maximum  off- 
diagonal  value  in  the  Gramm  matrix,  ignoring  the  value  of  the  remaining  terms,  and  hence  the  artifact 
introduced  by  Elad’s  algorithm  (Figure  1).  Finally,  note  the  significant  MSE  improvements  obtained  with 
our  proposed  algorithm. 

For  illustration  purposes.  Table  1  also  shows  the  running  time  of  Elad’s  algorithm  versus  the  proposed 
new  algorithm,  on  a  laptop  with  a  single  1.6  Ghz  processor  and  1.5  Gb  of  RAM.  Even  though  the  time 
will  change  from  one  implementation  and  computer  to  another,  this  illustrates  the  significant 
computational  advantage  of  our  proposed  technique,  Elad’s  algorithm  takes  about  600  times  longer. 
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Since  Elad’s  algorithm  is  not  robust  in  the  presence  of  non-idealities  such  as  non-exact  sparsity,  and  its 
running  time  becomes  prohibitive  for  practical  applications,  from  now  on  when  we  refer  to  the  non- 
random  sampling  matrix  4>(V)  computed  from  a  dictionary  V,  we  refer  to  the  new  algorithm  proposed 
here. 


III.  Simultaneously  Learning  the  Dictionary  and  the  Projection  Matrix 

It  is  now  time  to  turn  to  the  simultaneous  learning  of  the  sparsifying  dictionary  and  the  sensing  matrix. 
This  will  be  based  on  combining  the  just  introduced  approach  for  learning  the  sensing  matrix  with  the 
KSVD  algorithm  for  dictionary  learning.  We  start  then  by  briefly  introducing  this  KSVD  technique. 

A.  The  KSVD  algorithm 

Recently,  Aharon  et  al,  [18],  [42]  introduced  a  novel  algorithm  for  learning  overcomplete  dictionaries  to 
sparsely  represent  images.  Let  X  =  [^i  ■■■  ^p]  be  an  nxP  matrix  of  P  training  square  patches  of  length 

n  pixels  each,  used  to  train  an  overcomplete  dictionary  V  of  size  nxK,  with  P  »  K  and  K  >  n.  The 
objective  of  the  KSVD  algorithm  is  to  solve,  for  a  given  sparsity  level  S, 

mm,,,0||X-’I'0||^  s.t.  <S,  (13) 

where  0  =  [0i  •••  0p],  and  is  the  sparse  vector  of  coefficients  representing  the  patch  in  terms  of 

the  columns  of  the  dictionary  V  =  [i|ii  ...  i|i^:].  Starting  from  an  arbitrary  V,  the  KSVD  algorithm 

progressively  improves  it  in  order  to  optimize  the  above  expression,  as  described  next. 

Let  0  =  [8i  ...  8/^]^,  where  bj  are  the  rows  of  0.  Then,  as  in  the  previous  section,  the  error  term  in 

(13)  can  be  decomposed  as  ||X  —  —  ||^  “  ~  Let  us  define  E  :=  X  —  V0 

and  Ej  :=X  —  Yi^j  Then,  (13)  can  be  rewritten  as 

minxy_@\\Ej  s.  t.  Vi,  ||0j||^^  <  S.  (14) 

At  this  point  it  is  very  tempting  to  obtain  the  SVD  decomposition  of  Ey  and  eliminate  the  largest 
component  of  the  error  matrix  E  (see  previous  section).  However,  (14)  requires  also  satisfying  the  sparsity 
constrain.  Hence,  let  us  define  the  set  of  all  indices  corresponding  to  the  training  patches  that  use  the  atom 
i|iy  for  a  given  (temporary)  dictionary  i|i  (this  is  determined  simply  using  OMP  or  any  other  sparse 
representation  technique,  see  below),  i.e.. 


IEEE  TRANSACTIONS  ON  IMAGE  PROCESSING 


15 


cOj\  =  {p\l  <p<  P,8j(p)  ^0},  1  <7  <  K.  (15) 

In  matrix  form,  let  now  Slj  be  a  P  X  |a)y  |  matrix  with  ones  on  the  (<x)y(0^  0  entries  and  zero  elsewhere. 
Then,  (14)  ean  be  rewritten  as, 

min^,e\\EjSlj-}\,j8jSlj\\l,  (16) 

where  the  sparsity  set  eannot  be  altered  for  now.  Let  Ej  Eyfty,  S-y.  =  Sjfty,  thereby,  Ej  are  just  the 

eolumns  of  the  error  Ej  eorresponding  to  atom  i|iy  and  the  rows  of  0,  where  the  zeros  have  been 

removed.  Let  UAV^  be  the  singular  value  deeomposition  (SVD)  ofEf.  Then,  (16)  beeomes 

mm,,,0||UAV'r-il;y5j^||J.  (17) 

We  now  ean  eliminate  the  highest  eomponent  of  the  error  by  defining 

Illy  :=  Ui,  6y,R  :=  CTiVi,  (18) 

where  is  the  largest  singular  value  of  and  Ui,Vi  are  the  corresponding  left  and  right  singular 
veetors.  This  then  improves  the  dietionary  atom  i|iy  based  on  the  patehes  that  have  used  it  when 
eonsidering  the  temporary  dietionary  V.  This  eontinues  in  the  same  fashion  for  all  the  other  eolumns. 

In  summary,  the  KSVD  algorithm  eonsists  of  the  following  key  steps: 

1 .  Initialize  V 

2.  Repeat  until  eonvergenee: 

•  For  V  fixed,  solve  (13)  using  OMP  to  obtain  0,  ^  i.e.  0  =  OMP(V,X). 

•  Fory  =  1  toK 

Compute  cOj  and  from  there,  Ef,  8j^. 

Obtain  the  largest  singular  value  ofEf  and  the  eorresponding  singular  veetors. 

Update  V  and  0  using  (18).^ 

Experimentally,  we  found  that  initializing  V  with  an  overeomplete  dietionary  using  the  Diserete  Cosine 
Transform  (DCT)  [18]  produees  better  results  than  with  a  zero  mean,  normalized  random  matrix.  The 
results  reported  in  Seetion  IV  use  this  initialization  method. 


^  The  original  KSVD  uses  OMP  for  the  sparse  coding  step,  other  sparsifying  techniques  could  be  used  as  well. 

^  As  in  a  Gauss-Seidel  type  of  approach,  both  the  atom  and  the  corresponding  coefficients  are  updated  at  this  step. 
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B.  Coupled-KSVD 


Let  us  consider  now  the  problem  of  simultaneously  training  a  dictionary  V  and  the  projection  matrix  4>, 
with  the  images  available  from  a  dataset.  We  define  the  following  optimization  problem: 

mmv,,i,,0{a||X-'P0||^  +  ||Y-<l>’I'0||^}  s.  t.  Vi,  ||0J|^„  <  S,  (19) 

where  0  <  a  <  1  is  a  sealar  that  eontrols  the  weight  of  the  error  term  ||X  —  V0||f  and  Y  are  the  linear 
samples  given  by 

Y  =  OX  +  Ti,  (20) 

considering  T|  an  additive  noise  added  by  the  sensing  system.  Since  Y=[yi  ■■■  Yp]  is  an  mxP  matrix 
with  m  «n,  a  is  used  in  (19)  to  compensate  for  the  larger  value  of  the  reconstruction  error  given  by  the 
term  ||X  —  O0|||-,  and  to  give  more  importance  to  the  projection  error  term,  ||Y  —  OO0|||-,  which  is  what 
is  actually  available  at  the  reconstruction  stage.^  Notice  that  (19)  can  be  rewritten  as 

2 

s.t.  Vi,||0i||^^  <  5,  (21) 

A  possible  way  to  solve  (19),(21)  consists  in  extending  the  KSVD  algorithm  to  the  coupled  X  and  Y 
signals,  together  with  the  technique  to  adapt  4>  to  V  introduced  in  the  previous  section.  As  in  the  KSVD 
algorithm,  we  start  with  an  arbitrary  dictionary,  learn  the  sensing  matrix  most  appropriate  to  it  following 
the  approach  described  in  the  previous  section,  and  then  simultaneously  improve  both  of  them.  Let  us 
define 


Then,  (19),(21)  can  be  rewritten  as, 

||Z-Deq0||p  s.t.  Vi,  ||0j||^g  <  S, 
where  Dgq:  =  WV  =  ...  d^*^].  As  in  KSVD,  we  ean  write 


Z-2id®^5[||^ 


(22) 

(23) 

(24) 


^  While  at  the  sparsifying  dictionary  and  sensing  matrix  training  step,  we  have  available  both  the  images,  X,  and 
their  projections,  Y;  at  the  actual  reconstruction  step  we  have  only  the  sensed  data,  Y,  and  the  goal  is  to  reconstruct 
from  it  the  sparsest  possible  X,  with  the  learned  (^,  O)  (this  is  the  standard  CS/sparse-reconstmction  scenario,  but 
with  our  optimized  pair  4>)). 
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where  Sj  are  the  rows  of  0  as  defined  previously  for  the  standard  KSVD  algorithm.  Let  us  now  define 
E  :=  Z  —  DeqO,  Ey  :=  Z  —  Considering  also  and  fty,  as  defined  in  the  KSVD 

algorithm,  then  Equation  (23)  ean  be  rewritten  as 


TTUriijj  <j)  0 

Similarly,  let  UAV^  be  the  singular  value  decomposition  (SVD)  of  Ej^,  then  (25)  becomes 


(25) 


TTllTiip  <j)  0 


UAV'^  - 


and  the  highest  eomponent  of  the  (eoupled)  error  ean  be  eliminated  defining 


(26) 


(27) 


where  is  the  largest  singular  value  of  Ey^  and  are  the  eorresponding  left  and  right  singular 

veetors.  Now,  sinee  i|iy.  Equation  (27)  is  satisfied  if 


where  we  have  m  +  n  equations  and  n  unknowns  (the  length  of  the  i|iy  atom  of  dietionary  V).  Note  here 
the  importanee  of  eoupling  the  original  images  X  and  their  eorresponding  sensed  data  Y,  if  we  would  have 
not  inelude  X,  (28)  would  have  m  equations  and  n  unknowns,  henee,  infinitely  many  solutions  for  i|iy 
would  satisfy  (28).  In  other  words,  by  introdueing  the  regularizing  eondition  that  ||X  —  V0|||*  must  also 
be  minimized,  we  obtain  a  unique  solution  to  (28)  that  best  fits  the  training  data  X  and  its  projeetion  Y. 
From  (28),  i|iy  ean  be  eomputed  using  the  pseudo-inverse  as 


ipy  =  (a^H- d>'^d>)  ^(ai  (29) 

Sinee  i|iy  eomputed  using  (29)  does  not  neeessarily  have  unit  ^^-norm,  and  the  eolumns  i|iy  of  the  learned 
dietionary  V  should  have  unit  ^^-norm  [42],  we  redefine  i|iy  and  8y  r  as. 


in  order  to  keep  the  produet  on  (27)  unehanged. 


Sy,R , 


(30) 


We  have  now  updated  the  dietionary  V  and  the  eorresponding  sparse  eoeffieients  0  (repeating,  as  in  the 
KSVD,  the  above  proeedure  for  all  the  atoms),  eonsidering  4>  fix.  The  feedbaek  of  the  sensing  matrix  into 
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the  update  of  the  dietionary  is  provided  using  4>(V)  as  defined  in  Seetion  II,  i.e.,  learning  the  projeetion 
matrix  from  the  just  updated  dietionary  V.  In  turn,  V  will  be  affeeted  by  4>,  as  indieated  on  (28),  in  the 
next  iteration.  Additionally,  OMP  also  uses  the  eoupled  signal  Z  to  estimate  0.  Thereby,  the  whole 
learning  proeedure  is  simultaneous  and  exploits  all  the  available  data. 

In  summary,  the  proposed  eoupled-KSVD  algorithm  is, 

1 .  Initialize  V. 

2.  Repeat  until  eonvergenee: 

•  Eor  V  fixed,  eompute  4>(v)  using  the  algorithm  given  on  Seetion  II. 

•  For  V,  4>  fixed,  solve  (21)  using  OMP  to  obtain  0,  i.e.,  0  =  OMP(Deq,Z). 

•  For  j  =  \  io  K 

Compute  cdj  and  from  there  Ef,  using  (22)-(24). 

Obtain  the  largest  singular  value  ofEf  and  the  eorresponding  singular  veetors. 

Update  V  and  0  using  (28),  (29)  and  (30). 

In  the  next  seetion  we  evaluate  the  performanee  of  the  improved  algorithm  to  eompute  the  sensing 
projeetion  matrix  4>(V)  from  a  given  sparsifying  dietionary  V,  and  of  the  eoupled-KSVD  just 
introdueed,  and  show  their  advantage  over  previously  reported  teehniques. 

IV.  Experimental  Results 

In  this  seetion,  we  eompare  different  methods  to  eompute  the  dietionary,  V,  and  sampling  matrix,  4>, 
oriented  to  retrieve  the  original  image  patehes  from  their  linear  measurements.^^  The  methods  eonsidered 
are  the  elassieal  training  of  a  dietionary  using  KSVD,  eoupled  with  random  sampling  matriees  as 
eommonly  used  in  CS;  the  proposed  improved  algorithm  to  learn  4>  from  the  already  KSVD  learned 
dietionary  V  (Seetion  II);  and  the  new  eoupled-KSVD  algorithm  where  we  simultaneously  learn  both  V 
and  4>  from  the  data  available  (Seetion  III).  In  partieular,  we  eompare  the  retrieval  error  of  testing  patehes 
X  extraeted  from  real  images,  and  reeonstrueted  using  OMP^^  with  the  equivalent  dietionary  D  =  4>V 


Recall  that  in  the  real  CS-type  scenario,  once  we  have  already  learned  the  dictionary  and  sensing  matrix,  we 
have  to  recover  the  signal  only  from  its  linear  projections. 

To  be  consistent  with  the  KSVD-based  dictionary/sensing  training,  we  use  OMP  at  the  reconstmction  step  as 


well.  As  previously  mentioned,  we  can  replace  this  by  other  sparsifying  techniques. 
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from  their  noisy  linear  samples,  Y  =  OX  +  ?/,  for  the  following  (O,  O)  strategies: 

•  A  dietionary  O,  learned  from  the  real  image  data  using  the  standard  KSVD,  and  a  Gaussian  random 
sampling  matrix  O.  This  represents  the  standard  CS  seenario. 

•  A  dietionary  O,  learned  from  the  real  image  data  using  the  standard  KSVD,  and  then  eombined 
with  the  optimized  sensing  matrix  0(0)  eomputed  from  the  dietionary,  as  indieated  on  Seetion  IT 

•  A  dietionary  O,  learned  from  the  data  using  the  eoupled-KSVD  with  a  fix  Gaussian  random 
sampling  matrix  O/^ 

•  A  dietionary  O  and  sampling  matrix  0(0),  both  learned  from  the  data  using  the  full  eoupled- 
KSVD. 

The  first  two  strategies  are  uneoupled,  sinee  the  dietionary  O  is  learned  using  the  elassieal  KSVD, 
independently  of  O.  The  third  strategy  is  semi-eoupled,  sinee  the  sampling  projeetion  matrix  O  affeets 
the  learning  proeess  of  O  through  the  samples  Y  (Seetion  III),  but  not  viee  versa.  The  fourth  strategy  is 
eompletely  eoupled,  sinee  both  O  and  O  affeet  eaeh  other  during  the  learning  proeess:  O  affeets  the 
learning  of  O  through  the  samples  Y,  and  in  turn,  O  affeets  the  sampling  matrix,  sinee  O  depends  on 
OO^  (Seetion  II).  In  the  following  we  refer  to  eaeh  strategy  as  uneoupled  random  (UR),  uneoupled 
learning  (UL),  eoupled  random  (CR),  and  eoupled  learning  (CL),  respeetively. 

The  training  data  eonsists  of  6600  8  x  8  patehes  obtained  by  extraeting  at  random  15  patehes  from  eaeh 
one  of  the  440  images  in  the  training  set  (250  images  from  the  Berkeley  segmentation  data  set  [47]  and 
190  images  from  the  Labelme  data  set  [48]).  The  testing  data  eonsists  of  120000  patehes  eorresponding  to 
all  the  non-overlapping  patehes  of  size  8x8  extraeted  from  the  remaining  50  images  in  the  Berkeley 
dataset  that  are  not  in  the  training  set. 

The  different  strategies  (UR,  UL,  CR,  and  CL),  are  evaluated  in  terms  of  the  MSE  of  retrieval,  defined  as 
MSE  :=  ||X  —  V0|||*,  where  V  is  the  dietionary  learned  from  the  training  patehes,  X  is  the  matrix  of 
testing  patehes,  and  0  is  obtained  using  OMP  to  solve 

min4Y  -  s.t.  Vi,||0J|^„  <S,  (31) 

This  means  that  we  incorporate  the  sensing  in  the  learning  of  the  dictionary,  but  do  not  update  the  sensing 
matrix  and  keep  it  constant  during  the  iterations  of  the  eoupled-KSVD  algorithm. 
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i.e.,  0  =  0MP(D,  Y),  where,  O  is  a  random  sampling  matrix  for  the  UR  and  CR  strategies  and  a  learned 
sampling  matrix  for  the  UL  and  CL  strategies,  and  Y  is  a  noisy  version  of  the  projeeted  patehes,  as 

defined  in  (20).  The  noise  added  to  the  samples  ranges  from  0  to  25%  in  amplitude,  for  eaeh  pateh,  with 

{1111  1  1  11 
2'4'8  16  32^64  %28j 

We  use  typieal  values  for  the  other  algorithm  parameters:  sparsity  S  =  4,  5,  and  6;  sampling  dimension  m 
=  2*5,  2.5*5,  and  3*5;  pateh  dimension  n  =64  (8x8);  and  overeompleteness  ofK  =  4n.  These  values  are 
eommonly  used  in  learning  overeomplete  dietionaries  and  Compressive  Sensing  (see,  for  instanee  [l]-[8], 
[12],  [18],  [42]).  We  inelude  here  representative  results  from  this  large  set  of  possible  parameter 
eombinations,  see  the  supplementary  material  for  numerous  additional  graphs  and  tables. 

Eigure  2  eompares  the  average  MSE  of  retrieval  for  the  testing  patehes  using  *5=6,  m=12,  ^=64,  and 
K=256;  at  different  values  of  a  and  noise  level,  for  the  four  training  strategies.  We  elearly  observe  the 
signifieant  advantage  of  learning  the  sensing  matrix  (eoupled  or  uneoupled  from  the  dietionary)  over  the 
more  standard  use  of  random  projeetions. 


a:  1/4  1/8  1/16  1/32  1/64  1/128 

Figure  2:  Retrieval  MSE  using  the  four  training  strategies  at  different  noise  levels  and  values  of  a,  for  S=6,  m 
=12,  n  =  64,  and  K=256, 

Figure  3a  shows  the  retrieval  MSE  of  CL  relative  to  CR.  Note  that  the  MSE  using  eoupled  learning  (CL) 
is  almost  50%  (a  reduetion  of  ~3  Db)  of  the  MSE  using  semi-eoupled  learning,  with  a  random  projeetion 
matrix  (as  eommon  in  standard  CS).  The  differenee  between  CR  and  CL  reduees  as  the  noise  level 
inereases.  Nevertheless,  for  noise  levels  below  20%,  the  MSE  of  CL  is  at  least  30%  lower  than  CR  for  a 
>  !4.  Figure  3b  shows  the  retrieval  MSE  of  CL  relative  to  UL.  Here,  the  advantage  of  CL  over  UL  is 
lower  than  in  the  previous  ease  (Figure  3a),  whieh  indieates  that  a  well-designed  projeetion  matrix  as 
introdueed  in  Seetion  II,  learned  from  the  dietionary  V,  ean  do  better  than  simply  eoupling  the  data  using 
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a  random  sampling  matrix  (see  also  Figure  2).  However,  CL  ean  still  reduee  the  MSE  with  respeet  to  UL 
as  mueh  as  20%  (a  reduetion  of -IDb),  whieh  justifies  its  use. 

From  Figure  2  and  Figure  3  it  is  also  elear  that  the  retrieval  MSE  and  CL/CR  and  CL/UL  ratios  have  a 
1 

minimum  at  about  a  =  — .We  eomment  more  on  this  at  the  end  of  this  seetion. 


a:  1/4  1/8  1/16  1/32  1/64  1/128  a:  1/4  1/8  1/16  1/32  1/64  1/128 

(a)  (b) 

Figure  3:  For  S=6,  m  =12,  n  =  64,  and  K=256,  different  noise  levels  and  values  of  a,  a)  Ratio  between  the 
retrieval  MSE  for  CL  and  the  retrieval  MSE  for  CR,  b)  Ratio  between  the  retrieval  MSE  for  CL  and  the  retrieval 
MSE  for  UL. 

Figure  4  and  Figure  5  show  the  retrieval  MSE  and  the  CL/CR,  CL/UL  ratios  for  K=  64,  i.e.,  for  a 
dietionary  that  is  also  a  basis  for  the  veetor  spaee  of  image  patehes.  This  is  an  interesting  experiment, 
sinee  the  dietionary  now  is  not  overeomplete,  being  more  in  agreement  with  the  majority  of  the 
theoretieal  results  from  the  CS  framework.  These  figures  show  that  the  proposed  framework  is  also  valid 
within  this  seenario,  and  as  ean  be  appreeiated  in  these  figures,  the  CL/CR  and  CL/UL  ratios  are  even 
better  than  for  the  overeomplete  ease  (Figure  2  and  Figure  3),  indieating  that  the  proposed  eoupled 
learning  of  both  V  and  O  ean  be  even  more  influential  with  non-overeomplete  dietionaries. 

Due  to  spaee  limitations,  we  eannot  show  here  the  results  of  all  our  experiments  with  all  possible 
parameter  variations  (again,  see  supplementary  material).  However,  a  set  of  representative  results  is 
shown  in  Table  2,  whieh  indieates  the  best  values  of  a  that  produeed  the  minimum  retrieval  MSE  and  at 
the  same  time  the  best  CL/CR  and  CL/UL  ratios,  for  a  representative  noise  level  of  5%.  In  general,  for 
overeomplete  dietionaries  with  K  =  An,  we  found  that  the  best  values  for  a  are  — ,  — ,  and  -  for  m  =  IS, 

32  16  8 

2.5S,  and  3*S,  respeetively,  whieh  indieates  that  as  the  number  of  samples  is  redueed,  Y  must  have  greater 
importanee  than  X  in  the  optimization.  However,  as  detailed  before,  X  has  an  important  role  in  the 
eoupling  algorithm,  to  limit  the  number  of  possible  solutions  of  the  under-determined  inversion  problem 
with  m«n,  and  given  that  in  praetiee  m  >  2S,  then  we  must  have  a  >  0. 
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Ot:  1/2  1/4  1/8  1/16  1/32  1/64  1/128 

Figure  4:  MSE  of  retrieval  from  projections  using  the  four  training  strategies  at  different  noise  levels  and  values 
of  oi,  for  S=6,  m  =12,  n  =  64,  and  K=64, 


u 


noise  (%) 


a:  1/2  1/4  1/8  1/16  1/32  1/64  1/128 

(b) 


Figure  5:  For  S=6,  m  =12,  n  =  64,  and  K=64,  different  noise  levels  and  values  of  a,  a)  Ratio  between  the  MSE  of 


retrieval  from  projections  for  CL  and  MSE  of  retrieval  for  CR,  b)  Ratio  between  the  MSE  of  retrieval  from 
projection  for  CL  and  the  MSE  of  retrieval  for  UL. 


Table  2:  Sample  results  for  the  best  values  of  a  for  a  noise  level  of  5%. 


K 

5 

m 

QL 

MSE 

CL/CR 

CL/UL 

UR 

UL 

CR 

CL 

mean 

std 

mean 

std 

8 

1/32 

0.5306 

0.2154 

0.3872 

0.1890 

0.4823 

0.0880 

0.8805 

0.0317 

4 

10 

1/16 

0.4980 

0.1981 

0.3113 

0.1820 

0.5746 

0.0633 

0.9215 

0.0285 

12 

1/8 

0.4754 

0.1838 

0.2605 

0.1731 

0.6578 

0.0451 

0.9402 

0.0266 

10 

1/32 

0.5029 

0.1926 

0.2827 

0.1646 

0.5720 

0.0622 

0.8497 

0.0308 

256 

5 

12 

1/16 

0.4781 

0.1782 

0.2708 

0.1614 

0.5888 

0.0626 

0.9193 

0.0235 

15 

1/8 

0.3951 

0.1626 

0.2303 

0.1456 

0.6267 

0.0381 

0.9092 

0.0292 

12 

1/32 

0.5094 

0.1848 

0.2659 

0.1465 

0.5436 

0.0569 

0.8090 

0.0354 

6 

15 

1/16 

0.4203 

0.1633 

0.2254 

0.1375 

0.6016 

0.0452 

0.8399 

0.0273 

18 

1/8 

0.3974 

0.1477 

0.2033 

0.1287 

0.6294 

0.0261 

0.8570 

0.0266 

12 

1/8 

0.5801 

0.2275 

0.2865 

0.1409 

0.4814 

0.0661 

0.7107 

0.0569 

64 

6 

15 

1/4 

0.4601 

0.1929 

0.2241 

0.1241 

0.5483 

0.0499 

0.7357 

0.0521 

18 

1/2 

0.3994 

0.1838 

0.2210 

0.1243 

0.5568 

0.0400 

0.8026 

0.0314 

For  illustration  purposes,  Figure  6  shows  one  testing  image  eonsisting  of  non-overlapping  8x8  patehes 
reeonstrueted  from  their  noisy  projeetions  (5%  level  of  noise)  Y  as  X  =  V0,  where  0  =  OMP(D,  Y),  and 
are  obtained  using  either  UR,  UL,  CR,  or  CL  training  strategies.  The  worst  reeonstruetion  ease 
(Figure  6a)  is  obtained  when  V  is  learned  using  elassieal  KSVD  and  4>  is  simply  a  random  sampling 
matrix  (UR,  standard  CS  seenario),  followed  by  eoupled-KSVD  using  a  random  sampling  matrix  (CR, 
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Figure  6c).  Using  a  well-designed  sampling  matrix  4>(V)  (UL,  Section  II)  produces  a  good  looking 
reconstruction  of  patches  from  their  noisy  projections  (Figure  6b).  Finally,  using  the  coupled-KSVD  (CL, 
Section  III),  an  even  better  reconstruction  from  the  noisy  projections  is  obtained  (Figure  6d).  The  better 
quality  of  Figure  6d  over  Figure  6b  can  be  appreciated  by  the  reduction  of  the  artifacts,  especially  around 
the  sharp  edges  on  the  top  of  the  castle.  Additional  examples  for  the  UR  (standard  CS  framework)  and  the 
proposed  CL  technique  are  provided  in  Figure  7  (additional  details  on  these  figures,  including  the  CR  and 
UL  cases,  are  included  with  the  supplementary  material). 


(a)  (b)  (c)  (d) 


Figure  6:  One  test  image  reconstructed  from  projected  patches,  without  overlapping,  with  a  noise  level  of  5%, 
using  a)  Uncoupled  Random,  b)  Uncoupled  Learning,  c)  Coupled  Random,  and  d)  Coupled  Learning  strategies 
to  learn  the  dictionary  and  the  projection  matrix  from  the  training  patches.  The  retrieval  MSE  for  these  images  is 
a)  1.1528,  b)  0.4548,  c)  0.6721,  and  d)  0.3769. 

We  have  just  presented  the  improved  performance  in  terms  of  MSE  and  quality  of  the  reconstructed 
patches,  when  using  a  well  designed  sampling  matrix  instead  of  random  projection  matrices,  and  also 
when  exploiting  coupled  over  uncoupled  learning.  Let  us  now  compare  the  dictionaries  learned  from 
classic  KSVD  vs.  coupled-KSVD  with  a  random  matrix  and  with  a  learned  sampling  matrix.  We  also 
want  to  compare  the  random  projection  matrices  vs.  the  sampling  matrices  computed  using  the  proposed 
uncoupled  and  coupled  learning  techniques.  Let  us  start  by  comparing  the  equivalent  dictionaries 
D  =  4>V  for  the  four  strategies  considered  here,  in  terms  of  the  closeness  of  the  Gramm  matrix  G  =  D^D 
(see  Section  II)  to  the  identity  (as  inspired  by  the  RIP). 
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(d)  (e)  (f) 


Figure  7:  Additional  examples  of  image  recovery.  Note  the  sharp  improvement  with  our  proposed  CL  framework 
(bottom  row)  when  compared  to  the  more  classical  UR  scenario  from  CS  (top  row).  The  retrieval  MSE  for  these 
images  is  a)  1.0539,  b)  2.3455,  c)  0.8207,  d)  0.2389,  e)  0.6707,  and  f)  0.2204.  While  here  we  sample  at  twice  the 
sparsity  rate,  even  sampling  at  four  times  the  sparsity,  the  UR  results  are  far  from  the  CL  ones  at  just  twice  the 
sparsity,  both  in  visual  quality  and  MSE  (e.g.,  the  MSE  for  the  image  in  b)  becomes  1.1707,  while  better  than  the 
2.3455,  still  more  than  double  the  0.6707  MSE  obtained  with  the  proposed  approach  at  half  the  sampling  rate). 


15000 


Figure  8:  Distribution  of  the  off-diagonal  elements  of  the  Gramm  matrix  obtained  using  UR,  UL,  CR  and  CL 
strategies. 


Figure  8  shows  the  distribution  of  the  off-diagonal  elements  of  the  Gramm  matrix  for  eaeh  one  of  the 
strategies.  The  Gramm  matrix  of  CR  is  eloser  to  the  identity  than  the  Gramm  matrix  of  UR,  and  the 
Gramm  matrix  of  both  UL  and  CL  are  eloser  to  the  identity  than  UR  and  UL  respeetively,  but  they  are 
almost  undistinguishable  among  themselves  in  terms  of  the  distribution  of  the  off-diagonal  elements. 
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These  results  are  in  agreement  with  the  improved  performanee  observed  for  the  UL  and  CL  methods  over 
the  other  two  possible  strategies  that  use  random  sampling  matriees. 


Figure  9:  Visual  representation  of  the  learned  dictionaries  (above)  and  projection  matrices  (below)  for  the 
different  learning  strategies. 


Figure  9  a),  b)  and  e)  show  the  K=256  atoms  of  the  dietionaries  learned  with  KSVD  (used  for  both  UR 
and  UL),  eoupled-KSVD  with  a  random  sampling  matrix  (CR),  and  eoupled  KSVD  with  a  learned 
sampling  matrix  (CL),  respeetively,  represented  eonveniently  here  as =  16x16  images  of  size  8x8. 
We  elearly  see  that  the  learned  dietionaries  are  different.  In  order  to  eompare  the  sampling  matriees  O  for 
the  different  learning  strategies  eonsidered  here,  we  use  the  same  approaeh  proposed  by  [38],  i.e.,  good 
sampling  matriees  should  produee  signals  that  are  as  spread  as  possible  in  Figure  9  d),  e)  and  f)  show 
seatter  plots  of  the  first  row  of  Y  =  OX  vs.  rows  two  to  five  (for  random,  uneoupled  following  Seetion  II, 
and  eoupled  following  Seetion  III,  respeetively).  It  ean  be  notieed  here  that  UL  produees  samples  Y  with 
a  larger  spread  in  than  UR,  and  in  turn  CL  produees  samples  that  are  more  spread  in  than  UL 
(notiee  on  Figure  9  d),  e)  and  f)  the  ehange  of  seale  on  some  seatter  plots).  This  behavior  also  oeeurs  for 
all  the  possible  seatter  plots  among  rows  in  Y,  but  for  limitations  of  spaee,  we  only  present  here  the  first 
four. 

From  Figure  9  one  observes  that  besides  making  the  Gramm  matrix  eloser  to  the  identity,  the  learned 
dietionaries  should  also  be  able  of  learning  new  patterns  present  only  in  the  projeeted  signals  (and  not  in 
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the  signal  itself),  and  eoupled-KSVD  helps  to  introduee  those  new  patterns.  In  addition,  a  well-designed 
sampling  matrix  ean  improve  the  spread  of  the  projeeted  signals,  when  eompared  to  a  random  sampling 
matrix,  whieh  in  turn  improves  the  retrieval  of  the  original  signal  from  projeetions,  thanks  to  the  larger 
separability  of  those  signals  in  More  formally,  4>  should  maximize  the  mutual  information  I(X,  Y), 
between  the  signal  X  and  its  noisy  projeetion  Y,  whieh  is  equivalent  to  maximize  the  entropy  of  the 
output,  H(Y),  in  order  to  minimize  the  retrieval  error  of  X  from  its  noisy  projeetions  Y  [38],  [49]. 

V.  Concluding  Remarks 

A  eomputational  framework  for  learning  an  optimal  sensing  matrix  for  a  given  sparsifying  dietionary  was 
introdueed  in  this  paper.  This  was  eomplemented  by  a  novel  approaeh  to  simultaneously  learn  the  sensing 
matrix  and  sparsifying  dietionary  from  an  image  database.  We  showed  that  sueh  learning  leads  to 
signifieantly  improved  reeonstruetion  results  when  eompared  with  more  elassieal  eompressed  sensing 
seenarios  where  random  sensing  matriees  are  used.  The  same  framework  ean  be  used  to  learn  the 
sparsifying  dietionary  while  keeping  the  sensing  matrix  fix  (see  also  [50]). 

As  mentioned  in  the  introduetion,  the  theoretieal  results  for  CS  support  the  use  of  optimization,  while 
KSVD-type  of  algorithms  have  traditionally  been  based  on  OMP  (for  whieh  the  results  are  weaker).  It  is 
thereby  important  to  further  improve  the  results  here  presented  using  -based  optimization  approaehes. 
The  framework  here  developed  is  based  on  image  patehes,  as  eommonly  exploited  in  image  proeessing. 
While  in  prineiple  we  eould  work  with  entire  images,  this  is  eomputationally  unfeasible.  Eor  tiny  32  X 
32  images,  following  [51],  we  obtain  results  eonsistent  with  the  work  reported  above  for  the  patehes,  see 
Figure  10.  Of  eourse,  images  are  mueh  larger  than  this,  and  algorithms  of  the  type  of  KSVD  as  here 
developed,  or  basieally  any  dietionary  learning  approaeh,  are  virtually  impossible.  On  the  other  hand, 
following  onee  again  the  state  of  the  art  results  for  image  enhaneement  via  KSVD,  we  should  work  with 
overlapping  patehes  (e.g.,  8  x  8  or  multiseale  up  to  20  x  20,  see  [52]).  Ideally,  we  would  like  then  to 
have  the  dietionary  aeting  on  all  the  overlapping  n  X  n  patehes,  with  a  unique  sensing  matrix  globally 
aeting  on  the  N  X  N,n  «  N,  image.  This  will  also  permit  to  naturally  inelude  the  multiseale  framework 
developed  in  [52].  Results  in  this  direetion,  as  well  as  in  the  adaptation  of  the  sensing  to  the  task 
following  [22],  will  be  reported  elsewhere. 
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Figure  10:  Sample  MSE  results  for  tiny  images  (left  CL/CR,  right  CL/UL),  which  are  consistent  with  those 
reported  before  for  patches  (S  =  125,  m  =  250,  a  =  1/32). 
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